On Handling Negative Transfer and Imbalanced Distributions in Multiple Source Transfer Learning

نویسندگان

  • Jing Gao
  • Liang Ge
  • Kang Li
  • Hung Q. Ngo
  • Aidong Zhang
چکیده

Transfer learning has benefited many real-world applications where labeled data are abundant in source domains but scarce in the target domain. As there are usually multiple relevant domains where knowledge can be transferred, multiple source transfer learning (MSTL) has recently attracted much attention. However, we are facing two major challenges when applying MSTL. First, without knowledge about the difference between source and target domains, negative transfer occurs when knowledge is transferred from highly irrelevant sources. Second, existence of imbalanced distributions in classes, where examples in one class dominate, can lead to improper judgement on the source domains’ relevance to the target task. Since existing MSTL methods are usually designed to transfer from relevant sources with balanced distributions, they will fail in applications where these two challenges persist. In this article, we propose a novel two-phase framework to effectively transfer knowledge from multiple sources even when there exists irrelevant sources and imbalanced class distributions. First, an effective supervised local weight scheme is proposed to assign a proper weight to each source domain’s classifier based on its ability of predicting accurately on each local region of the target domain. The second phase then learns a classifier for the target domain by solving an optimization problem which concerns both training error minimization and consistency with weighted predictions gained from source domains. A theoretical analysis shows that as the number of source domains increases, the probability that the proposed approach has an error greater than a bound is becoming exponentially small. We further extend the proposed approach to an online processing scenario to conduct transfer learning on continuously arriving data. Extensive experiments on disease prediction, spam filtering and intrusion detection datasets demonstrate that: (i) the proposed two-phase approach outperforms existing MSTL approaches due to its ability of tackling negative transfer and imbalanced distribution challenges, and (ii) the proposed online approach achieves comparable performance to the offline scheme. © 2014 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 7: 254–271, 2014

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

A Study on free convective heat and mass transfer flow through a highly porous medium with radiation, chemical reaction and Soret effects

The paper addresses the effects of Soret on unsteady free convection flow of a viscous incompressible fluid through a porous medium with high porosity bounded by a vertical infinite moving plate under the influence of thermal radiation, chemical reaction, and heat source. The fluid is considered to be gray, absorbing, and emitting but non-scattering medium, and Rosseland approximation is consid...

متن کامل

A Comparative Study of the Predictive Factors of Learning Transfer to Workplace in Public and Private Hospitals

Introduction: The effectiveness of organizational training courses depends on learning transfer to workplace; therefore, identifying the predictor’s transfer of learning has become one of the Necessities in human resource development. On the other hand, recent studies support the idea that the predictor’s transfer of learning are influenced by culture and context of each organization. In accord...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical Analysis and Data Mining

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2013